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The  Committee  on  Human  Factors  was  established  in  October 
1980  by  the  Commission  on  Behavioral  and  Social  Sciences 
and  Education  of  the  National  Research  Council.  It  is 
sponsored  by  the  Office  of  Naval  Research,  the  Air  Force 
Office  of  Scientific  Research,  the  Army  Research  Institute 
for  the  Behavioral  and  Social  Sciences,  the  National 
Aeronautics  and  Space  Administration,  and  the  National 
Science  Foundation.  The  principal  objectives  of  the 
committee  are  to  provide  new  perspectives  on  theoretical 
and  methodological  issues,  identify  basic  research  needed 
to  expand  and  strengthen  the  scientific  basis  of  human 
factors,  and  to  attract  scientists  both  inside  and 
outside  the  field  to  perform  needed  research.  The  goal 
of  the  committee  is  to  provide  the  solid  foundation  of 
research  as  a  base  on  which  effective  human  factors 
practices  can  build. 

Human  factors  issues  arise  in  every  domain  in  which 
humans  interact  with  the  products  of  a  technological 
society.  In  order  for  the  committee  to  perform  its  role 
effectively,  it  draws  on  experts  from  a  wide  range  of 
scientific  and  engineering  disciplines.  The  committee 
includes  specialists  in  the  fields  of  psychology, 
engineering,  biomechanics,  cognitive  sciences,  machine 
intelligence,  computer  sciences,  sociology,  and  human 
factors  engineering.  Other  disciplines  participate  in 
the  working  groups,  workshops,  and  symposia  organized  by 
the  committee.  Each  of  these  disciplines  contributes  to 
the  basic  data,  theory,  and  methods  required  to  improve 
the  scientific  basis  of  human  factors. 
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Computers  are  pervasive  in  civilian  and  military 
equipment  systems.  The  compatibility  of  computer-based 
devices  and  human  users  is  predominantly  dependent  on  the 
characteristics  of  the  software.  The  term  software 
human  factors  refers  to  the  process  of  designing 
software  to  be  effective  for  human  use,  i.e.,  easy  to 
learn  and  use,  productive,  and  efficient.  However,  no 
specific  efforts  have  been  made  to  operationally  define 
the  objectives  of  software  human  factors^-ci^  necessary 
step  both  to  focus  research  goals  and  to  provide  a 
framework  for  development  of  general  application 
principles. 

While  a  large  amount  of  research  has  been  performed  on 
software  features  related  to  ease  of  use  or  user  compat¬ 
ibility,  most  of  these  studies  have  been  limited  to  a  few 
features  investigated  in  a  specific  context.  Conse¬ 
quently,  results  from  different  studies  cannot  be  inte¬ 
grated,  and  it  is  hard  to  draw  conclusions  that  can  be 
generalized  to  other  situations.  Overriding  problems  in 
the  development  of  principles  of  software  human  factors 
are  the  lack  of  knowledge  of  how  research  on  software 
human  factors  should  be  conducted  and  a  paucity  of  tech¬ 
niques  for  measuring  performance.  For  example,  little  is 
known  about  how  to  collect  user  data  on  "ease  of 
learning,"  how  to  define  errors,  how  to  record  complex 
response— time  metrics,  and  how  to  measure  user 
satisfaction. 

Researchers  Interested  in  the  development  of  principles 
for  the  design  of  user -compatible  software  have  great 
need  for  guidance  in  both  research  methods  and  performance 
measurement  techniques.  As  an  initial  effort  to  fulfill 
this  need,  the  committee  conducted  a  two-day  workshop  to 
bring  together  highly  qualified  researchers  with  knowledge 
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about  how  to  design  software  to  be  usable  based  on 
studies  in  diverse  fields. 

~  The  Workshop  on  Software  Human  Factors  was  convened  in 
June  1983  in  Washington,  D.C.  The  Impetus  for  the 


— Factors.-  The  workshop  had  three  goals: 

o  To  identify  current  methods  used  to  design  and 
evaluate  human  factors  aspects  of  software, 
including  overall  design  and  methods  for  collecting 
data  on  user  performance; 

o  To  ascertain  what  we  know  from  software  research 
results  that  we  did  not  know  10  years  ago;  and 

o  To  identify  new  research  methods  that  are  needed, 
both  to  develop  design  principles  for  software  and 
to  discover  how  users  understand  software  systems.^ 

A  group  of  14  nationally  recognized,  active  researchers  C 
in  the  field  of  human-computer  interaction  from  both 
industry  and  academia  were  invited  to  participate  in  the 
workshop.  These  workshop  members  represented  a  variety 
of  pertinent  disciplines,  including  human  factors,  cogni¬ 
tive  psychology,  computer  science,  experimental  psychol¬ 
ogy,  social  psychology,  and  business  administration.  The 
relevant  bodies  of  knowledge  represented  by  the  partici¬ 
pants  include  experimental  design  and  data  analysis,  human 
performance  measurement,  software  design,  information 
processing,  learning,  and  attitude  assessment.  Prior  to 
the  workshop,  participants  prepared  short,  informal  posi¬ 
tion  papers  on  the  issues  for  distribution.  To  accomplish 
the  goal  of  collecting  the  desired  knowledge  about  the 
design  of  software,  the  group  spent  two  days  listing  both 
design  and  evaluation  methods  currently  in  use  for  the 
product  development  of  good  software  and  relevant  research 
methods  for  understanding  basic  issues  in  user-software 
interaction;  describing  each  method  and  constructing  a 
list  of  references  in  which  these  methods  are  used; 
categorizing  methods  according  to  their  uses  in  various 
stages  of  software  product  development  or  in  more  basic 
research;  and  suggesting  new  methods  and  techniques, 
designating  their  possible  uses,  and  indicating  which 
appear  to  have  high  near-term  payoff. 

The  technical  aspects  of  the  workshop  were  organized 
by  committee  members  Nancy  S.  Anderson  and  Alphonse 
Chapanis.  The  meeting  was  chaired  by  Nancy  Anderson. 
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The  report  that  follows,  edited  by  Nancy  Anderson  and 
Judith  Reitman  Olson,  is  based  on  discussions  froai  the 
workshop  and  written  materials  and  references  contributed 
by  the  participants  during  and  subsequent  to  the  workshop. 
Special  appreciation  is  extended  to  Robert  T.  Hennessy 
and  M.  Jeanne  Richards,  formerly  of  the  committee  staff, 
for  their  contributions  in  making  the  sessions  productive 
and  pleasant;  to  Stanley  Deutsch,  study  director  of  the 
committee,  for  his  contributions  to  the  organization  and 
preparation  of  the  report;  to  Christine  McShane,  of  the 
Commission  staff,  for  editorial  support;  and  to  Anne 
Sprague,  administrative  secretary,  for  secretarial  and 
administrative  support.  They  all  helped  to  usher  this 
report  to  publication. 

Nancy  S.  Anderson,  Chair 

Workshop  on  Software  Human  Factors 
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INTRODUCTION 


At  present,  software  for  specific  applications  and 
user-computer  interfaces  are  aggressively  developed  in 
industry,  but  they  are  designed  largely  with  only  the 
designer's  intuition  as  guide  and  often  without  empirical 
testing  with  end  users.  Two  observations  made  in  a 
popular  software  magazine  point  out  the  resulting  problem: 

The  computer  systems  and  software  we  have  today 
are  too  damn  complicated  for  the  end  user.  There 
is  too  much  to  learn,  too  many  fiddly  details,  too 
much  jargon,  too  much  said  that  shouldn't  be  and 
not  enough  said  that  should  be  .  .  .  (A. 

Johnson-Laird,  Software  News,  April  1982) . 

Data  processing  still  has  one  ongoing  problem  to 
solve:  the  end  user's  dissatisfaction  with 
today's  systems.  The  entire  industry  has  been 
grappling  with  this  problem  of  ergonomics,  or  the 
interface  between  human  and  machine.  In  the  case 
of  data  processing,  ergonomics  involves  the 
development  of  "user-friendly"  systems  which  can 
be  operated  by  the  user  at  the  terminal  and  which 
generate  results  that  the  user  can  understand  and 
utilize  (M.  Parks,  Software  News.  February  1983) . 

Because  of  such  difficulties,  some’ industry  and 
academic  research  groups  are  developing  an  interest  in 
gathering  and  building  appropriate  guidelines  from  basic 
research  and  incorporating  these  guidelines  and  observa¬ 
tions  of  users'  behavior  into  the  design  process.  A  new 
field  has  emerged  called  software  psychology  or  the 
psychology  of  human-computer  interaction.  It  is  in  a 
very  exciting  state — a  relatively  new  amalgam  of 


1 


2 


experimental/cognitive  psychology,  computer  science, 
business,  and  engineering. 

The  field  is  growing  in  a  variety  of  sectors.  There 
are  more  human  factors  groups  in  industry  than  ever 
before.  Approximately  50  universities  in  this  country 
and  abroad  have  PhD  programs  in  human-computer  inter¬ 
action,  which  are  housed  in  psychology,  computer  science, 
social  sciences,  engineering,  business,  and  English 
departments  (Mantei  and  Smelcer,  1984).  Many  more  schools 
offer  one  or  more  courses  in  the  area.  The  Association 
for  Computing  Machinery  has  a  Special  Interest  Group  for 
Computer-Human  Interaction  (SIGCHI) .  The  Human  Factors 
Society  has  a  group  called  the  Computer  Systems  Technical 
Group,  which  is  concerned  with  human  factors  aspects  of 
interactive  computing  systems,  the  data  processing 
environment,  and  software  development.  Consumer  demand 
for  computers  is  increasing  at  a  rapid  pace,  and  many 
schools  are  acquiring  computers  for  tutoring  and  the 
word-processing  and  mathematical  tools  that  they  provide. 
The  systems  that  sell  are  those  that  provide  the  right 
usability  and  functionality — that  provide  the  right 
design  for  the  end  user. 


THE  NEED  FOR  NEW  METHODS 

Designing  systems  to  fit  the  end  user  is  a  difficult 
process.  The  field  is  searching  for  new  methods. 
Classical  experimental  designs  (e.g.,  controlled 
factorial  designs)  may  not  be  appropriate  for  industrial 
settings  in  which  cost-effectiveness  and  timeliness  are 
major  concerns.  However,  tests  of  single,  intuition- 
driven  designs  with  users,  measuring  their  performance 
and  satisfaction,  do  not  advance  our  general  knowledge 
about  designs  and  do  not  indicate  why  certain  features 
are  good  or  bad. 

There  are,  however,  hybrid  methods  being  used  in 
industry,  and  new,  more  complex  laboratory  tests  being 
constructed  to  assess  users'  performance  in  and  under¬ 
standing  of  complex  systems.  These  methods  are  described 
below,  along  with  their  advantages  and  disadvantages  and 
where  they  fit  into  the  product  development  cycle.  Each 
method  is  annotated  with  references  to  a  few  key  articles 
that  report  its  use. 
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THE  PRODUCT  DEVELOPMENT  CYCLE 

Software  products  are  typically  developed  in  three 
general  stages: 

1.  Analysis — the  product's  functionality  and 
initial  hardware/software  constraints  are 
determined,  analysis  is  made  of  the  product's 
projected  costs  and  benefits,  and  a 
development  schedule  is  projected. 

2.  Design — the  product  is  designed,  first  at  the 
level  of  functional  specifications  and  later 
in  complete  detail,  then  coded  and  tested, 
ending  with  a  running  system. 

3.  Implementation — the  product  is  distributed  and 
installed  in  its  final  locations,  and  users 
are  trained  and  then  operate  the  equipment. 

At  all  three  stages  human  factors  considerations 
appear : 

1.  In  assessing  users'  needs  and  capabilities 
during  the  analysis  phase; 

2.  In  designing  and  redesigning  the  system  with 
human  factors  principles  of  usability,  and  in 
testing  prototypes  with  end  users  during  the 
design  stage;  and 

3.  In  monitoring  use  of  the  system  after  its 
implementation,  gathering  information  for 
redesign  to  correct  errors  or  to  add  new, 
useful  features. 

In  what  follows  the  methods  appropriate  to  each  of 
these  stages  are  described.  These  methods,  or  their 
variants,  are  useful  for  both  laboratory  research  and 
industry.  They  may  be  used  in  the  slower,  more  con¬ 
trolled  environment  of  the  laboratory,  where  research  is 
designed  to  study  people's  performance  on  complex  tasks. 
And  they  contribute  equally  to  design  and  evaluation  in 
industry,  where  timeliness  is  frequently  considered  to  be 
more  important  than  the  ability  to  generalize  from  the 
results. 


HUMAN  FACTORS  METHODS  IN  RESEARCH  AND  PRODUCT  DESIGN 


ANALYSIS:  GATHERING  IDEAS 

The  ideas  behind  products  typically  arise  from  three 
major  sources:  from  the  redesign  of  an  existing  product , 
from  an  identified  need  in  the  marketplace,  and  from  a 
new  technological  capability  that  provides  a  useful  new 
function  to  users.  Information  about  the  success  of 
existing  products  can  be  obtained  either  by  asking  their 
users  for  their  opinions  and  uses  of  the  systems  or  by 
gathering  unobtrusive  data  about  their  use.  Information 
about  a  new  product  can  come  from  reports  of  needs  from 
potential  users. 


Reports  from  users 

Questionnaires  and  interviews  are  the  most  connon 
methods  for  gathering  information  about  the  success  of  a 
product  or  the  needs  for  new  functions  or  a  new  product. 
Both  questionnaires  and  interviews  are  good  methods  for 
eliciting  information  about  how  a  person  goes  about  his 
or  her  work,  what  aids  or  tools  he  or  she  uses  or  desires, 
what  kind  of  knowledge  or  training  is  required  to  do  the 
work,  what  difficulties  he  or  she  reports  about  the  work, 
where  the  work  originates  and  where  it  goes,  what  inter¬ 
actions  are  necessary  with  other  people  to  do  the  work, 
and  how  the  user  thinks  the  work  process  could  be 
improved.  Questionnaires  are  more  rigid  in  format  than 
interviews,  since  interviews  can  go  where  the  interviewee 
leads,  often  uncovering  unanticipated  new  information. 

The  principal  disadvantage  of  interviews,  however,  is 
that  they  are  time-consuming;  only  one  person  can  be 
interrogated  at  a  time.  By  aggregating  information  from 
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a  nuaber  of  interviewees  or  questionnaires,  one  can 
construct  a  general  picture  of  users'  needs  and  construct 
some  tentative  system  concepts  for  helping  the  users  do 
their  work  (Kelley  and  Chapanis,  1982;  Rosson,  1983). 

Diaries  provide  a  similar  fora  of  inforaal  data 
gathering  and  are  used  to  uncover  the  needs  and  capabil¬ 
ities  of  the  potential  users  of  a  new  product.  Data 
about  work  can  be  gathered  in  detail  over  a  long  period 
of  time,  especially  about  how  much  time  particular  kinds 
of  activities  take  and  their  sequential  dependencies. 
Because  a  shorter  time  elapses  between  the  occurrence  of 
an  event  and  its  report,  diaries  give  a  more  accurate 
record  of  actual  activity  than  retrospective  reports  in 
questionnaires  and  interviews  (Mantel  and  Haskell,  1983) . 

A  common  marketing  technique  for  gathering  information 
about  existing  or  potential  users'  needs  is  the  focus 
group.  Instead  of  interviewing  a  single  user  at  a  time, 
groups  of  users  who  are  either  similarly  trained  or  who 
share  common  goals  are  first  told  about  some  potential 
capabilities  of  a  system,  then  asked  to  discuss  how  they 
might  find  uses  for  these  capabilities.  Occasionally 
active  brainstorming  from  these  sessions  generates  very 
good  ideas.  The  same  kind  of  method  is  used  to  collect 
opinions  about  an  existing  product  and  to  ask  for  sug¬ 
gestions  for  improvements .  Often  designers  will  gather 
expert  users  of  a  system  and  ask  their  opinion  about  how 
to  improve  the  system  or  how  to  design  a  new,  computer- 
based  tool  for  aiding  their  work  (Al-Awar  et  al.,  1981). 
The  advantage  of  such  methods  is  that  the  participants 
stimulate  each  others'  thoughts,  uncovering  ideas  or 
suggestions  they  may  not  have  thought  of  individually. 
That  is  also  its  disadvantage:  a  participant's  true 
opinions  can  be  swayed  by  group  pressure. 


Inferring  Needs  from  Natural  Observation 

One  of  the  main  drawbacks  of  the  methods  listed  above 
is  that  they  rely  on  users'  perceptions  of  their  needs 
and  capabilities.  Sometimes  new  products  meet  needs 
unforeseen  by  their  users;  sometimes  users,  either 
consciously  or  unconsciously,  distort  their  daily  work 
activities  and  feelings  about  existing  working  conditions. 
In  such  cases,  it  may  be  better  to  collect  information, 
not  by  asking  users,  but  by  watching  their  behavior  and 
inferring  their  needs  and  capabilities  from  their 
activities. 
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Two  Methods  are  often  used  to  collect  information 
about  user s'  behavior  in  natural  work  settings,  in  the 
case  of  activity  analysis,  an  observer  watches  and 
records  certain  behaviors  of  the  workers.  The  data  may 
be  collected  by  direct  observation  or  by  analyzing  video 
or  film  recordings.  Individual  samples  of  categorized 
activities  are  aggregated  into  activity  frequency  tables, 
graphs,  or  state  transition  diagrams.  Such  performance 
analyses  are  particularly  useful  in  assessing  the  changes 
made  in  work  by  comparing  activity  before  and  after  a  new 
system  or  design  change  is  implemented  (Hartley  et  al., 
1977;  Hoecker  and  Pew,  1980). 

Logging  and  metering  techniques  involve  observations 
of  what  a  user  does  with  a  system,  but  the  measurement  is 
embedded  directly  into  the  software.  These  procedures 
can  include  a  simple  record  with  a  time-stamp  of  every 
interaction  that  a  user  makes  with  the  computer,  or  it 
can  involve  a  complete  hard  copy  representation  of  a 
sequence  of  particular  display  frames.  Powerful  logging 
and  metering  software  can  also  categorize  certain 
recognizable  events  and  summarize  their  times.  For 
example,  one  could  summarize  such  events  as  time  to 
complete  a  task,  user  and/or  system  response  time,  and 
frequencies  and  types  of  errors. 

Logging  and  metering  procedures  are  typically  embedded 
in  the  operational  software,  where  there  are  limits  to 
the  access  to  such  software,  one  can  connect  a  second 
computer  in  tandem  to  the  first  and  direct  data  about  the 
user's  activities  to  it,  in  essence  providing  a  "passive 
tap."  In  this  way,  logging  does  not  interfere  with  system 
response  times,  and  information  about  the  user  inputs  and 
the  system  responses  can  be  recorded  in  detail  for  future 
use  (see  Whiteside  et  al.,  1982;  Goodwin,  1982). 


DESIGN;  THE  INITIAL  DESIGN 

Designers  go  through  two  stages  in  constructing  an 
initial  design,  either  implicitly,  driven  by  intuition  or 
experience,  or  explicitly,  using  some  or  all  of  the 
detailed  tools  described  below.  First,  the  designers 
decide  what  the  user  is  going  to  do,  conducting  an 
informal  or  formal  task  analysis.  Second,  they  specify 
what  the  interface  will  look  like  and  what  the  dialog 
will  consist  of.  There  are  a  variety  of  methods  that 
apply  to  this  stage,  where  designers  use  informal  or 
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formal  guidelines,  consult  end  users,  or  have  some 
theory-based  judgments  to  draw  on. 


Determining  What  the  User  Needs  to  Do 

The  most  common  form  of  analyzing  the  user's  activities 
is  called  a  task  analysis.  Task  analysis  is  the  process 
of  analyzing  the  functional  requirements  of  a  system  to 
ascertain  and  describe  the  tasks  that  people  perform.  It 
focuses  both  on  how  the  system  fits  within  the  global  task 
the  user  is  trying  to  perform  (e.g.,  prepare  a  report  of 
a  projected  budget)  and  what  the  user  has  to  do  to  use 
the  system  (e.g.,  access  the  application  program,  access 
the  data  files,  etc.). 

Task  analysis  has  two  major  aspects:  the  first 
specifies  and  describes  the  tasks,  and  the  second,  and 
more  important,  analyzes  the  specified  tasks  to  determine 
such  system  or  environmental  characteristics  as  the 
number  of  people  needed,  the  skills  and  knowledge  they 
should  have,  and  the  training  necessary.  The  first  step 
involves  decomposition  of  tasks  into  their  constitutent 
subtaskB  and  annotating  each  subtask  for  its  essential 
elements  and  their  interdependencies.  The  second  step 
involves  examination  of  the  actual  tasks  and  interdepen¬ 
dencies,  assessing  how  difficult  each  is,  what  knowledge 
is  required,  where  the  information  resides,  etc.  Results 
of  task  analyses  are  used  not  only  in  writing  functional 

specifications  for  a  particular  application,  but  also  for  » 

assigning  work  to  groups  of  workers,  arranging  equipment  i 

in  an  efficient  configuration,  determining  task  demands 
on  people,  and  developing  operating  procedures  and  train¬ 
ing  manuals  (see  Bullen  and  Bennett,  1983:  Bullen  et  al., 

1982) . 


Specifying  the  Initial  Design 

An  initial  system  or  Interface  design  is  constructed 
next.  With  the  global  tasks  the  user  has  to  perform 
specified  as  above,  the  designer  groups  the  subtasks 
according  to  logical  function  from  the  perspective  of  the 
user  but  tempered  by  system^hardware  constraints.  Then 
the  actual  interface  or  system  details  come  from  three 
sources:  design  guidelines  or  principles,  intuitions  of 
the  designer  sosmtimes  aided  by  intuitions  of  the  users 
themselves,  and  theory-based  judgments. 
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In  generating  an  initial  design,  the  designer  can 
address  existing  design  guidelines  for  general  prescrip¬ 
tions  of  how  to  specify  particular  components  of  the 
interface.  For  example,  if  the  interface  has  a  menu,  the 
guideline  may  prescribe  that  the  alternatives  should  be 
listed  by  order  of  frequency  of  use  or  cluster  them 
according  to  functional  similarity,  rather  than  displayed 
alphabetically  or  randomly.  Current  design  guidelines 
(e.g.,  Woodson  and  Conover,  1966;  Van  Cott  and  Kinkade, 
1972)  include  prescriptions  about  such  topics  as  the 
readability  of  type  fonts,  the  brightness  levels  of 
display  screens,  keyboards  designed  to  fit  hand  shape  and 
function,  and  rules  for  making  abbreviations  and  symbols 
(see  also  Schneiderman,  1982;  Smith,  1982). 

Current  guidelines,  however,  are  more  concerned  with 
perceptual  and  performance  characteristics  than  with  the 
cognitive  properties  of  the  interaction.  Thus,  they 
would  prescribe  appropriate  type  fonts,  but  not  what 
words  these  fonts  should  express  to  the  user  to  suggest 
the  appropriate  analogy  for  performing  the  task  on  the 
system.  There  are  several  major  caveats  in  the  use  of 
design  guidelines:  the  prescriptions  or  recommendations 
contained  may  have  been  derived  from  situations  or 
research  not  applicable  to  the  system  being  designed;  new 
or  unaccounted  for  variables  may  interact  in  unanticipated 
ways;  and  current  guidelines  do  not  always  publish  the 
source  of  the  recommendation,  whether  it  was  generated  by 
a  controlled  laboratory  study  or  derived  from  the  col¬ 
lected  wisdom  of  experience.  Guidelines  have  to  be 
applied  with  care. 

Though  design  guidelines  have  their  flaws,  they  are 
very  useful  in  placing  a  particular  new  design  in  a 
setting  of  conventional  wisdom.  Often  the  designer, 
skilled  in  interacting  with  systems  and  cognizant  of  the 
end  tasks  that  are  being  supported  in  this  design,  cannot 
foresee  the  difficulties  the  new  user  will  have  with  the 
system.  Design  guidelines  provide  suggestions  to  the 
designer  that  will  in  many  cases  be  better  than  those 
based  solely  on  intuition.  (For  a  recent  version  of 
guidelines,  see  Smith,  1984.) 

The  skills  and  knowledge  of  users  themselves  can  be 
used  to  advantage  by  incorporating  users  in  the  design 
team.  Users  can  provide  some  critical  insights  about  how 
they  think  of  the  task  and  thus  the  system  (e.g.,  what 
kinds  of  information  should  be  accessible  when,  what  the 
screens  should  look  like  to  mimic  the  original,  a 
noncomputer  version  of  the  task,  what  commands  ought  to 


be  called) .  They  know  the  procedures  and  terminology 
and#  with  proper  support#  can  contribute  to  the  design 
and  layout  of  forms  and  aenus  as  well  as  act  as  critics 
of  the  design.  Gould  and  Lewis  (1985)  and  Miller  and  Pew 
(1981)  provide  exaaples  of  the  involveaent  of  users  in 
the  design  process.  Other  ways  in  which  the  sophisticated 
user  can  be  involved  in  the  design  of  software  systems 
can  be  found  below  in  the  section  on  prototype  testing 
with  users. 

A  third  source  of  information  about  the  original  design 
specification  is  psychological  theories.  Theory-based 
judgments  can  constrain  aspects  of  a  design  or  suggest 
promising  areas  of  investigation.  For  example#  theories 
of  color  contrast  can  provide  insight  into  the  appro¬ 
priateness  of  certain  combinations  used  in  screen  high¬ 
lighting  or  predict  the  readability  of  a  new  monochrome 
display  color.  Because  Fitt's  Law  accounted  for  movement 
time  for  placing  a  cursor  in  a  desired  position  with  a 
mouse  and  for  placing  the  appropriate  finger  on  a  desired 
key  location#  two  conclusions  follow:  the  invention  of 
faster  pointing  devices  was  unlikely  to  increase  perfor¬ 
mance  and  the  design  of  keyboards  with  larger  peripheral 
key  caps  would  increase  the  accuracy  of  keying  (Card  et 
al.#  1978}  Card  et  al.,  1980b). 

Part  of  the  difficulty  in  constructing  a  design  and 
analyzing  its  usability  has  to  do  with  how  the  interface 
is  specified.  Verbal  descriptions  of  how  a  system  works 
are  particularly  unsuited  for  conveying  the  flow  of  an 
interaction  and  the  choices  the  user  has  at  each  point. 
Several  specification  languages  or  formats  have  been 
explored  recently  not  only  to  serve  as  a  way  of  conveying 
to  those  who  actually  build  or  code  the  system  what  it 
will  do  but  also  as  a  way  of  concretely  specifying  the 
system  to  analyze  its  usability. 

One  way  to  specify  the  interaction  is  to  use  an  inter¬ 
active  tool  kit  called  a  human-computer  dialog  management 
system.  This  system  guides  the  definition  of  the  inter¬ 
action  language  that  describes  the  actions  of  the  user 
and  the  system  and  the  screen  formats  displayed  at  each 
moment.  Hartson  et  al.  (1984)#  Jacob  (1983)#  and 
Wasserman  (1982)  provide  good  examples  of  this  kind  of 
interface  definition.*  A  second  format  for  displaying 


*This  is  also  a  system  that  allows  rapid  embodiment  of 
the  functioning  of  a  new#  developing  system  and  thus  is  a 
tool  for  rapid  prototyping. 
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what  tha  system  does  at  aach  stata  is  a  state  transition 
diagram,  racantly  us ad  as  a  description  of  a  systaa's 
workings  in  Kieras  and  Poison  (1983) . 


DESIGN t  FORMAL  ANALYSIS  OF  THE  INITIAL  DESIGN 

Ones  an  initial  dasign  is  specified,  even  if  it  is  a 
partial  design,  it  can  ba  subjected  to  several  kinds  of 
scrutiny.  The  goal  in  this  analysis  stage  is  to  stake  the 
initial  dasign  as  good  as  possible  before  it  is  aade  into 
tha  prototype  for  user  tasting.  Three  aethods  aid  in 
this  process t  structured  walk-throughs,  decomposition, 
and  task-theoretic  analytic  models. 

Structured  walk-throughs  involve  construction  of 
tasks  that  a  user  carries  out  on  a  simulated  system.  The 
user  tries  out  the  system  by  going  through  the  task,  step 
by  step,  screen  by  screen,  command  by  command.  This  can 
be  done  with  the  design  as  specified  in  a  number  of 
different  formats,  using  an  experimental  simulation  of  a 
prototype  or  even  with  the  experimenter  presenting  paper 
and  pencil  figures  of  the  screens,  menus,  and  commands  in 
the  appropriate  sequence.  The  technique  helps  to  identify 
confusing,  unclear,  or  incomplete  instructions,  illogical 
or  inefficient  operations,  unnatural  or  difficult  proce¬ 
dures,  and  procedural  steps  that  aay  have  been  overlooked 
because  they  were  iaplicitly  rather  than  explicitly 
defined.  Gould  et  al.  (1983),  Ramsey  (1974),  Ramsey  et 
al.  (1979),  and  Weinberg  and  Friedman  (1984)  provide 
examples  of  the  use  of  structured  walk-throughs. 

A  second  kind  of  formal  analysis,  called  decomposition, 
is  proposed  in  Reitman  et  al.  (1985).  In  this  analysis, 
the  major  components  of  the  design  are  separated  and 
analysed  for  their  impact  on  cognition.  The  picture 
displayed  on  the  screen,  for  example,  is  assessed  for  how 
it  helps  or  hinders  the  user's  ability  to  perceive  mean¬ 
ingful  relationships  or  the  system  model.  The  commands 
are  assessed  for  their  load  on  long-term  memory,  how  easy 
they  are  to  remember,  and  how  confusable  they  are  among 
each  other.  For  each  component,  a  second  design  alterna¬ 
tive  is  constructed  to  fit  within  the  general  guidelines 
of  usability.  Then,  through  discussion  and  debate,  the 
design  team  decides  which  alternative  of  each  component 
is  the  better  design.  This  method  encourages  careful 
scrutiny  of  the  proposed  design  and  often  encourages 
designers  to  specify  better  interfaces  before  the  first 
prototype  is  built. 
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The  third  kind  of  formal  techniques  invoke  task- 
theoretic  analytic  models .  These  models  provide 
representations  and  analyses  that  assess,  for  example, 
which  parts  of  a  metaphor  aid  performance  and  which  do 
not  (Douglas  and  Moran,  1983)  and  how  big  the  user's 
short-term  memory  load  is  at  each  step  of  th*  Interaction 
(Kieras  and  Poison,  1985).  Prime  examples  of  these  tech¬ 
niques  include  metaphor  analysis  (Carroll  and  Thomas, 
1982}  Carroll  and  Mack,  1982),  assessment  of  mental 
models  (deKleer  and  Brown,  1983;  deKleer  and  Brown,  in 
press;  and  others  in  Gentner  and  Stevens,  1983),  develop¬ 
ment  of  production  rule  systems  that  represent  the  user's 
knowledge  of  the  task  (Kieras  and  Poison,  1985),  object/ 
action  analysis  (called  "external/internal  task  mapping" 
by  Moran,  1983),  the  GOMS  model  (Card  et  al.,  1980b; 
1983),  and  formal  grammar  notation  systems  (Reisner, 
1981a,  1984;  Blesser  and  Foley,  1982). 

These  task  analytic  models  are  very  useful  tools. 
However,  none  of  them  yet  encompasses  all  of  the  cogni¬ 
tive  aspects  of  the  interaction;  each  focuses  on  one  or 
more  important  aspects.  These  methods  require  training 
to  use  and  often  take  a  long  time.  However,  they  all 
have  the  advantage  of  being  based  on  sound  theories  of 
human  behavior  and  can  provide  important  analysis  of 
usability  before  any  coding  of  software  or  running  of 
subjects  is  contemplated.  There  is  a  trade-off,  then, 
between  time  spent  in  analysis  and  time  spent  testing 
users  in  the  laboratory  or  the  field.  The  hope  embodied 
in  this  approach  is  that  as  the  science  of  user- inter face 
design  grows,  analytic  tools  will  improve  to  the  point  of 
making  the  actual  user  testing  of  designed  systems  merely 
a  last,  short  check  of  a  good,  finished  design. 


DESIGN:  BUILDING  A  PROTOTYPE 

Three  methods  provide  simulations  or  quick  versions  of 
significant  aspects  of  a  new  system  so  it  can  be  tried  by 
actual  users.  The  methods  are  called  facading,  the 
Wizard  of  Oz  technique,  and  rapid  prototyping. 

Facading  is  the  technique  of  quickly  and  inexpen¬ 
sively  building  a  simulation  of  the  external  appearance 
(i.e.,  the  "facade")  of  a  system's  interface.  Its  advan¬ 
tages  are  that  it  is  quick  and  relatively  easy;  the  target 
system's  underlying  complexity  and/or  final  computational 
capability  is  "finessed."  To  be  maximally  beneficial, 
the  facade  must  embody  some  level  of  the  functional 
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capability  of  the  final  target  system.  It  does  not  just 
generate  a  series  of  static  snapshots  of  the  system  but 
rather  includes  the  control  structure,  flow,  or  connectiv' 
ity  of  the  final  system.  Hanau  and  Lenorovitz  (1980)  and 
Lenorovitz  and  Ramsey  (1977)  provide  good  examples  of  the 
use  of  this  technique. 

A  variant  of  the  facading  technique  is  the  Wizard  of 
Oz  technique.  Instead  of  having  the  computer  embody  the 
simulated  system,  hidden  human  operators  intercept  user 
commands  and  provide  output  back  to  the  user.  Often  the 
technique  is  used  to  test  a  new  interface  language:  the 
hidden  human  operator  intercepts  the  new  commands,  trans¬ 
lates  them  into  the  real  system  commands,  and,  after 
receiving  output  from  the  real  computer  system,  retrans¬ 
lates  them  back  to  the  tested  end-user  (see  Gould  et  al., 
1983;  Gould  and  Boies,  1978;  Ford,  1981;  Kelley,  1983; 
Wixon  et  al.,  1983). 

Rapid  or  fast  prototyping  are  terms  applied  to  the 
more  formalized  building  of  a  prototype  in  a  hurry.  The 
speed  of  building  a  running  system  depends  mainly  on  the 
underlying  supporting  software,  which  makes  the  specific 
prototype  programmable  from  existing  modules.  Ideally, 
the  prototype  programming  language  separates  elements  of 
the  dialog  from  the  actual  implementation  software.  For 
example,  the  designer  can  specify  the  placement  of  the 
command  input  line  or  the  menu  choices  variously  without 
having  to  program  new  modules  to  execute  these  different 
input  formats.  One  of  these,  the  "dialog  management 
system,”  is  under  development  by  Hartson  and  his 
colleagues  (Hartson  et  al.,  1984;  Yunten  and  Hartson, 
1984);  another  system  is  described  in  Wasserman  (1982) 
and  Wasserman  and  Shewmake  (1982).  Another  project  that 
uses  rapid  prototyping  methods  is  reported  in  Hayes  et 
al.  (1981). 


DESIGN:  PROTOTYPE  TESTING  WITH  USERS 

When  a  prototype  of  some  form  has  been  built,  actual 
user 8  are  then  brought  in  to  use  the  system  and  report 
their  opinions  about  it.  These  tests  can  vary  greatly  in 
how  well  controlled  their  designs  are  and  how  representa¬ 
tive  the  set  of  tested  users  are  of  the  final  population 
of  users.  Moreover,  users  are  asked  to  perform  several 
kinds  of  tasks,  some  testing  the  normal,  frequent  tasks 
that  regular  users  will  be  expected  to  perform,  others 
testing  those  subtasks  thought  to  be  especially  difficult 
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either  for  the  system  (e.g.,  those  producing  long  system 
response  times)  or  for  the  user  (e.g.,  the  longest 
sequence  of  commands  for  a  particular  type  of  task) . 
Prototype  tests  differ  in  what  kinds  of  data  are  taken 
from  the  user — times  and  errors,  thinking  aloud  protocols, 
or  attitudes. 


Experimental  Designs 

Field  tests  to  evaluate  systems  are  fashioned  after 
laboratory  tests  common  in  the  academic  field  of  experi¬ 
mental  psychology.  In  general,  they  require  the  compari¬ 
son  of  at  least  two  systems,  systems  that  differ  in  only 
one  component  or  variable.  Measures  are  designed  to 
reflect  the  performance  attributable  to  the  effects  of 
that  variable,  and  subjects  are  chosen  to  be  representa¬ 
tive  of  the  population  of  end  users.  Of  particular  impor¬ 
tance  are  various  techniques  for  controlling  irrelevant 
variables.  For  example,  one  must  ensure  that  measures  of 
intelligence  of  the  test  subjects  do  not  differ  across 
both  conditions,  affecting  the  results  in  addition  to  the 
effects  of  the  independent  variables. 

Often  the  rules  of  good  experimental  design  are 
violated  in  the  interest  of  proceeding  quickly.  Subjects 
who  are  different  from  the  end  users  but  more  available 
may  be  tested;  comparisons  may  be  made  between  two  systems 
that  differ  on  more  than  one  variable;  measures  may  be 
taken  that  are  less  sensitive  than  those  that  will 
directly  test  why  performance  on  one  system  is  better  or 
worse  than  another;  occasionally  only  one  system  is 
tested  and  performance  on  it  is  measured  against  some 
predetermined  standard  (e.g.,  a  10-minute  rule  for  time 
to  learn  a  system) .  The  closer  the  test  is  to  good 
experimental  design,  the  more  quickly  the  findings  can 
advance  knowledge  about  the  important  aspects  of  good 
human-computer  interface.  However,  as  is  often  the  case 
in  development,  the  goal  is  not  ultimate  knowledge  but 
rather  global  assessment  of  the  adequacy  of  a  particular 
interface  or  system.  A  compromise  design  procedure  is 
described  in  Reitman  et  al.  (1984).  The  use  of  experi¬ 
mental  design  is  found  in  Ledgard  et  al.  (1981),  Reisner 
et  al.  (1975),  Reisner  (1977,  1981b),  and  Williges  and 
Williges  (1982). 

One  variant  from  controlled  experimental  evaluation 
that  has  been  found  useful  in  the  development  of  inter¬ 
faces  is  called  quasi-experimental  design.  These 
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designs  involve  capturing  data  at  several  time  intervals, 
typically  of  durations  measured  in  weeks  or  months. 
Sometime  during  the  data  capturing  intervals,  a  change  or 
a  modification  of  a  system  is  introduced;  the  data  being 
captured  are  expected  to  reflect  the  impact  of  this 
change.  Some  of  these  quasi-experimental  designs  allow 
for  comparisons  with  a  control  group.  These  designs  are 
hard  to  control,  since  the  investigator  must  typically 
take  existing  groups  of  users,  giving  one  the  change  and 
the  other  no  change.  Inherent  differences  in  existing 
groups  is  a  major  worry  in  evaluating  the  results.  A 
complete  description  of  this  technique  can  be  found  in 
Cook  and  Campbell  (1979) ;  Koltum  (1982)  and  Rice  (1982) 
provide  good  examples  of  this  method. 


Selection  of  Tasks  to  Perform 

There  are  two  reasons  one  has  users  try  out  a  prototype 
system:  to  identify  points  of  difficulty  for  the  user  so 
that  those  points  can  be  redesigned  and  to  measure  stan¬ 
dard  use  of  the  system,  so  that  later  changes  in  hardware 
can  be  assessed  or  so  those  concerned  with  the  staffing 
of  a  large  operation  of  users  can  determine  how  many 
people  will  be  needed.  For  the  first  purpose,  tasks  are 
selected  that  stress  the  system  and  the  user,  generally 
called  critical  incidents.  For  the  second  purpose,  tasks 
are  selected  to  estimate  basic  characteristics  of  the 
system's  use,  called  benchmark  tests. 

In  terms  of  critical  incidents ,  the  goal  is  to  set 
up  situations  or  tasks  that  have  been  shown  historically 
to  tax  the  user  and/or  the  system  and  are  sufficiently 
important  that  they  can  make  the  difference  between 
success  or  failure  on  task  or  system  performance.  One 
might,  for  example ,  require  the  user  to  access  items 
distant  from  what  is  being  presented  on  the  current 
screen  or  to  perform  a  long  command  Sequence,  to  deter¬ 
mine  the  loads  of  this  part  of  the  design  on  the  user's 
ability  to  imagine  the  stored  information's  underlying 
structure  or  the  mnemonic  characteristics  and  grammatical 
rules  implied  by  the  command  sequences.  The  goal  is  to 
set  up  situations  in  which  the  data  will  tell  the 
designers  something  about  the  limits  of  human  or  system 
performance.  These  tasks  are  illustrated  in  the  work  of 
Al-Awar  et  al.  (1981),  Kelley  and  Chapanis  (1982),  and 
Flanagan  (1954) . 


In  benchmark  tests,  the  goals  are  quite  different. 

The  designer  wants  to  measure  the  likely  performance 
times  and  errors  expected  in  normal  use.  The  tasks  are 
not  designed  to  tax  the  system  or  the  user,  but  rather  to 
be  representative  of  the  kinds  of  frequent  tasks  the 
system  will  normally  support.  Typically,  tasks  are 
constructed  to  measure  the  expected  amount  of  time  it 
takes  a  new  user  to  learn  a  system,  the  amount  of  time  it 
takes  the  user  to  perform  a  set  of  predefined  tasks,  and 
the  amount  of  time  it  takes  the  system  to  respond  to  a 
user's  request.  A  good  study  that  illustrates  the  use  of 
this  method  is  that  of  the  evaluation  of  eight  text 
editors  by  Roberts  and  Moran  (1983).  A  study  of  data¬ 
base  interfaces  using  benchmarks  was  done  by  Mantei  and 
Cattell  (1982). 


Kinds  of  Data  Collected 

There  are  four  major  kinds  of  data  collected  in  tests 
of  systems:  the  time  it  takes  to  perform  a  task,  the 
frequency  and  kinds  of  errors,  the  goals  and  intentions 
of  the  users,  and  the  attitude  of  the  user. 

The  amount  of  time  a  task  takes  (either  how  long  an 
entire  task  takes  or  how  long  each  successive  keystroke 
takes)  reflects  the  time  it  takes  the  user  to  perceive 
inputs,  categorize  and  plan  appropriate  actions,  and 
execute  proper  responses.  Error  frequencies  and  types 
reflect  the  difficulties  users  have  with  these  processes 
and  often  point  to  the  cause  of  the  error  (whether  the 
error  response  is  similar  to  one  in  a  similar  plan,  was 
generated  from  confusion  with  a  similar  screen,  has  a 
label  that  sounds  the  same  as  another,  etc.)  A  simple 
analysis  of  users'  times  and  errors  is  found  in  Reisner 
et  al.  (1975)  and  Reisner  (1977).  A  comprehensive 
analysis  of  users'  times  is  found  in  Card  et  al.  (1980b, 
1983) .  Other  uses  of  times  and  errors  can  be  found  in 
Boies  (1974),  Rosson  (1984),  Sheppard  and  Kruesi  (1981), 
and  Thomas  and  Gould  (1975) . 

A  more  thorough,  complicated  kind  of  data  to  collect 
during  evaluation  involves  the  user's  thinking  aloud 
while  performing  the  task.  Typically  the  user  is  video- 
and  sound-recorded  while  he  or  she  is  performing  the 
tasks.  The  recording  captures  what  is  said  and  done, 
what  is  displayed  on  the  screen,  what  sections  of  the 
documentation  are  being  examined,  what  parts  of  the  task 
instructions  the  user  is  reviewing,  etc.  The  most 
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complete  protocols  ask  the  subjects  to  verbalize  their 
intentions,  what  their  goals  are,  and  what  current  plans 
they  have  about  reaching  their  goals.  Other  behavior  is 
directly  observable;  thoughts  and  plans  typically  are 
not.  This  method  has  been  used  by  Mack  et  al.  (1983) , 
Carroll  and  Mack  (1982),  and  Card  et  al.  (1980a)  in  their 
studies  of  skilled  text  editing.  More  complete  descrip¬ 
tions  of  the  technique  and  its  advantages  and  disadvan¬ 
tages  can  be  found  in  Lewis  (1982),  Olson  et  al.  (1984), 
and  Ericsson  and  Simon  (1980). 

A  third  kind  of  data  collected  in  evaluation  sessions 
is  the  users'  opinions  about  the  system's  ease  of  use 
and  functionality.  A  common  instrument  used  to  scale 
users'  global  attitudes  about  the  system  is  the  evalua¬ 
tion  component  of  Osgood  et  al.'s  (1957)  Semantic 
Differential  (see  Good,  1982,  for  an  example  of  its 
use).  Questionnaires  and  interviews  also  tap  users' 
reactions  to  particular  components  of  the  system.  One 
problem  with  users'  reports,  however,  is  that  they  are 
typically  distorted  by  their  experience  with  other, 
similar  systems.  Or  a  user  may  have  difficulty  separating 
components  of  the  system  such;  for  example,  a  user  who 
has  a  very  difficult  time  using  a  system  may  report  that 
he  or  she  likes  it  a  great  deal,  recognizing  how  much 
easier  it  is  to  perform  the  task  on  a  computer  compared 
with  previous  manual  methods. 


Redesign 

Typically  as  the  prototype  of  the  original  design  is 
tested,  errors  are  found  and  revisions  suggested.  The 
methods  appropriate  to  the  initial  design  are  appropriate 
also  at  the  stage  of  redesign.  This  part  of  the  design 
process  iterates  through  "fixing"  and  "testing"  until 
either  an  acceptable  level  of  performance  is  reached  or 
the  deadline  for  developing  the  system  is  reached. 


IMPLEMENTATION;  MONITORING  CONTINUED  PERFORMANCE 

Just  as  data  were  collected  in  the  original  conception 
and  analysis  phase  of  product  development,  data  are  col¬ 
lected  on  the  system  as  implemented.  At  this  stage, 
activity  analyses,  diaries,  logging  and  metering,  and 
questionnaires  and  interviews  are  all  appropriate  methods 
for  assessing  whether  the  product  as  designed  is  perform- 
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ing  as  predicted  in  the  final  environment.  If  problems 
are  found  in  the  field,  either  small  corrections  are  made 
in  the  code  (e.g.,  changing  what  a  command  is  called  is 
easy  to  change  in  the  code  but  can  have  an  enormous 
impact  on  the  ease  of  use) ,  or  a  redesign  is  called  for , 
sending  the  product  design  process  back  to  prototype 
development  or  fully  back  to  the  top  of  the  cycle. 


OTHER  METHODS 


Three  additional  methods  are  worth  mentioning,  though 
they  do  not  fit  neatly  into  the  scheme  above.  They 
include  the  dialog  specification  procedure,  experimental 
programming,  and  case  studies. 

The  dialog  specification  method  is  a  global  procedure 
that  cuts  across  the  first  several  steps  outlined  above. 
It  is  a  procedure  that  prescribes  a  method  for  developing 
an  interactive  dialog  with  a  system  and  sets  a  design 
standard.  The  method  includes  task  analysis  and  flow 
charting  of  user  activities  as  well  as  standard  means  of 
communicating  the  specific  design  requirements  to  the 
programmer.  The  design  standard  describes  acceptable 
screen  layouts,  interactive  devices  and  how  they  are  to 
be  used,  acceptable  command  language  syntax,  etc.,  down 
to  a  level  of  detail  compatible  with  the  specificity  of 
the  range  of  applications  to  which  it  is  intended  to 
apply.  For  example,  if  all  designs  concerned  telephone 
management  applications,  the  specification  would  deal 
only  with  the  range  of  tasks  in  this  domain.  These 
specifications  are  built  from  human  factors  principles  as 
well  as  accumulated  data  from  user  testing.  Pew  et  al. 
(1979)  describe  this  method  more  fully. 

Experimental  programming  is  similarly  a  more  global 
method  for  designing  systems  and  interfaces.  It  is  a 
more  flowing,  adaptive  technique  involving  users, 
designers,  and  programmers  (sometimes  all  in  the  same 
person) .  Someone  builds  a  prototype  of  a  new  system  with 
some  fraction  of  the  functionality  and  some  fraction  of 
the  user  interface  in  place.  This  prototype  is  then  used 
by  a  variety  of  programmer/users  who  generate  suggestions 
for  new  features  and  suggestions  for  revisions  for  exist¬ 
ing  functions.  As  many  suggestions  ss  possible  are 
incorporated  into  the  prototype »  the  good  features 
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survive,  poor  features  disappear.  Occasionally,  when 
new  features  are  incompatible  with  the  old,  a  competing 
prototype  is  built.  Sometimes  someone  merges  the  most 
popular  ideas  from  both.  This  method  is  very  informal. 

The  only  rules  for  its  application  are  that  everyone's 
opinion  get  a  fair  hearing  and  that  anyone  in  the  commun¬ 
ity  can  implement  a  change. 

This  method  allows  for  progressively  better  understand¬ 
ing  of  the  application  as  well  as  the  computation  and 
interface  requirements.  Its  weakness  lies  in  its  casual 
nature  and  that  it  relies  on  the  opinion  of  users,  most 
of  whom  are  programmers;  its  strength  lies  in  its  explora¬ 
tory,  evolutionary,  democratic  nature.  One  well-known 
product  that  benefited  from  experimental  programming  is 
the  EMACS  text  editor  (Stallman,  1980),  which  pioneered 
such  concepts  as  user-customization,  on-line  documenta¬ 
tion,  and  a  particular  command  style.  In  addition, 
Teitelman  (1972)  used  experimental  programming  to  develop 
the  concept  called  DWIM  ("Do  What  I  Mean"),  which  included 
a  set  of  facilities  that  automatically  corrected 
predictable  errors. 

A  third  global  technique  goes  under  the  rubric  of  case 
studies.  Case  studies  involve  observation  and  analysis 
of  a  singe  user,  group,  or  project.  The  information 
collected  may  range  from  informal,  subjective  impressions 
to  detailed  quantitative  data.  Because  case  studies 
involve  no  comparison  or  control  group,  they  are  not  very 
useful  in  inferring  causality.  As  a  result  they  are  not 
appropriate  for  building  a  data  base  of  basic  research 
results  from  which  to  construct  theories  and  principles. 
They  can,  however,  be  extremely  useful  for  gaining 
insights  when  one  is  first  investigating  an  area  of 
interest  and  for  providing  concrete  demonstrations  of  the 
use  of  new  methods  and  tools. 

An  example  of  a  case  study  in  which  new  insights  were 
gained  about  a  domain  involved  the  use  of  the  Ada  system. 
The  purpose  of  the  study  was  to  understand  the  problems 
that  are  likely  to  arise  when  the  system  is  first  intro¬ 
duced  into  an  organization  (Bailey  et  al.,  1982).  A 
second  case  study  involved  a  demonstration  of  new  methods 
for  designing  systems  to  be  embedded  in  special  purpose 
hardware,  such  as  airplanes  and  tanks  (Britton  et  al., 
1981).  The  documentation  and  related  products  produced 
by  this  case  study  provide  examples  that  others  may  use 
in  trying  to  apply  the  methods  to  their  own  software 
projects.  Brooks  (1975)  documents  the  use  of  a  case 
study  in  a  large  computer  programming  project.  And,  the 


case  study  by  Baker  <1972)  was  extreaely  influantial  in 
leading  the  structured  progressing  revolution.  Others 
include  Gould  and  Boies  (1978 ,  1983 ,  1984),  and  Heninger 
(1980) . 


ADVANCES  AND  SUCCESSES 


Over  the  last  10  years,  it  has  becosm  clear  that 
research  on  the  issues  surrounding  human-computer 
interaction  is  worth  doing.  The  design  of  the  human- 
computer  interface  makes  a  narked  difference  in  users' 
perforaance.  Software  products  exist  that  embody  well- 
designed  interfaces  derived  from  huaan  factors  input: 
the  Xerox  STAR,  Apple  LISA,  and  MACINTOSH  work  stations 
and  the  Rola  and  IBM  aail  systems  are  exaaples.  In 
addition,  aajor  changes  in  the  design  of  the  telephone 
directory  assistance  systen,  as  well  as  original  designs 
of  telecommunication  control  devices,  were  a  result  of 
huaan  factors  studies. 

Huaan  factors  research  has  also  shown  the  usefulness 
of  some  important  generic  display  and  control  devices: 
the  partitioning  of  screens  into  windows,  icons  for  the 
control  of  operations  and  the  display  of  objects,  better 
help  messages,  and  better  defined  response  and  function 
keys.  In  addition,  more  is  known  about  users'  limitations 
and  adaptability. 

Huaan  factors  design  is  also  influencing  documentation 
and  training  for  software  use  (Felker,  1980).  Because 
software  is  more  available  to  a  variety  of  users,  there 
is  an  increased  awareness  by  the  public  of  the  need  to 
make  software  easy  to  learn  and  use. 
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FUTURE  METHODS 


Although  we  have  catalogued  a  variety  of  methods  to  be 
used  in  the  software  design  and  research  process,  some 
needs  for  information  are  still  unmet.  The  research 
needs  fall  roughly  into  three  categories  of  needs:  new 
theories,  new  representations,  and  new  data  collection 
and  analysis  methods. 


THEORIES 

Three  particular  kinds  of  theories  are  seen  as  needed. 
Automation  theories  would  tell  us  what  should  be  auto¬ 
mated  and  what  should  be  assigned  to  the  human  processor. 
Such  theories  would  also  prescribe  an  appropriate  mix  of 
automation  and  human  control.  Some  seeds  of  theories  are 
suggested  in  the  field  of  supervisory  control  and  in 
office  analysis  techniques,  but  a  more  explicit  theory  is 
needed  to  prescribe  the  best  mix  of  human  and  computer 
processing. 

Theories  of  individual  differences  would  tell  us 
about  the  different  kinds  of  computer  support  required 
and  desired  by  different  user  populations.  Special 
continuing  interest  focuses  on  the  differences  between 
naive  or  casual  users  and  expert  or  dedicated  users. 

Theories  of  standardization  would  tell  us  about 
which  aspects  of  a  system  should  be  standardised  for  all 
users  (as  in  the  basic  control  devices  in  an  automobile) 
and  which  can  be  customised  for  adaptation  by  and  for 
specific  users. 

In  addition,  two  taxonomies  are  needed:  a  character¬ 
isation  of  the  kinds  of  tasks  for  which  software  can  be 
built  (so  that  design  prescriptions  can  be  tied,  perhaps, 
to  particular  classes  of  tasks)  and  a  characterisation  of 
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the  kinds  of  users  that  use  software  applications  (related 
to  the  theories  of  individual  differences  described 
above).  The  partial  taxonomy  of  human-computer  interface 
tasks  advanced  by  Lenorovitz  et  al.  (1984)  provides  a 
baseline  for  this  effort. 


REPRESENTATION 

Many  of  our  analyses  outside  the  testing  of  a  working 
system  with  real  end  users  require  some  specification  of 
what  the  system  can  do,  what  the  user  knows  about  how  the 
system  works,  and  how  the  user  conceives  of  the  task. 

There  is  thus  a  need  for  better  representational  schemes 
than  those  now  being  used.  One  such  scheme  would  describe 
a  complex  system  so  that  documentation  and  training  could 
be  better  designed.  Another  would  represent  exactly  how 
a  system  works — the  interface,  dialog,  communication,  or 
transaction — so  that  the  design  could  be  both  analyzed 
for  its  fit  to  users'  needs  and  capabilities  and  conveyed 
to  those  who  have  to  program  it. 

We  need  techniques  for  inferring  what  a  user  currently 
understands  of  a  system,  a  method  for  extracting  the 
appropriate  information  from  the  user  and  for  displaying 
the  resulting  understanding  or  "mental  model."  These 
techniques  are  as  useful  in  basic  research  on  the  per¬ 
formance  of  complex  tasks  as  they  are  in  the  applied 
design  process.  (A  report  of  the  Committee  on  Human 
Factors'  workshop  on  mental  models  in  the  use  of 
information  systems  is  scheduled  for  publication  in  1985.) 


DATA  COLLECTION,  MEASURES,  AND  ANALYSES 

Although  we  have  a  rich  variety  of  measures  to  collect 
from  users  interacting  with  a  system,  we  have  no  direct 
measures  of  the  user's  affect  nor  do  we  collect  any  of 
the  neurophysiological  responses  that  accompany  intense 
work,  frustration,  and  satisfaction.  In  addition,  there 
is  a  need  for  better  hardware  tools  for  collecting  logging 
and  metering  information  without  slowing  the  system  that 
the  user  normally  interacts  with.  More  specific  methods 
are  needed  for  analyzing  the  mountain  of  data  that  comes 
from  protocol  analysis,  not  only  in  deducing  how  the  user 
is  satisfying  his  or  her  task  goals  and  subgoals,  but 
also  in  deducing  ongoing  memory  and  perceptual  loads  on 
the  user  and  how  the  user  compensates  for  them  in  per- 


forming  th«  task.  Our  task  analysis  methods  need  to  be 
expanded  to  include  more  cognitive  aspects  of  the  user's 
performance,  his  or  her  memory,  language,  and  perceptual 

aspects. 

Research  methods  considered  most  likely  to  produce 
high  payoff  in  the  near  future  include: 

o  Representations  of  the  users'  understanding  of  a 

system; 

o  Representations  of  a  dialog  to  convey  the  design  to 
programmers; 

o  More  comprehensive  task  analyses  that  include 

memory,  perceptual,  and  language  considerations  as 
well  as  timing  and  error  predictions;  and 
o  Hardware  advances  that  allow  the  collection  of 
logging  and  metering  data  for  tapping  the  current 
use  of  a  system. 


CONCLUSION 


The  field  of  software  human  factors  is  rising  in  its 
research  needs  faster  than  the  scientific  data  base  is 
growing.  Additional  basic  research  is  clearly  needed. 
Educational  programs  are  now  training  future  researchers 
and  practitioners  in  this  field.  Data  in  laboratories 
and  industry  need  to  be  collected  more  systematically  and 
disseminated  more  widely.  As  a  compendium  of  current 
methods,  their  descriptions  and  evaluations,  and  refer¬ 
ences  to  existing  literature  that  use  these  methods,  this 
report  should  then  help  coalesce  the  field  and  move  it 
toward  fruitful  work  in  the  future. 
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